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Kalman-filtering using local interactions 



Abstract 

There is a growing interest in using Kalman-filter models for brain modelling. In 
turn, it is of considerable importance to represent Kalman-filter in connectionist 
forms with local Hebbian learning rules. To our best knowledge, Kalman-filter has 
not been given such local representation. It seems that the main obstacle is the 
dynamic adaptation of the Kalman-gain. Here, a connectionist representation is 
presented, which is derived by means of the recursive prediction error method. We 
show that this method gives rise to attractive local learning rules and can adapt 
the Kalman-gain. 
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1 Introduction 



Linear dynamical systems (LDS) are well studied and widely applied tools in 
both state estimation and control. Inference in LDS becomes simple, unbiased 
and has minimized covariance if the Kalman-filter recursion is used. Recently, 
there is growing interest in Kalman-filters or Kalman-filter like structures as 
models for neurobiological substrates. It has been suggested that Kalman- 
filtering (i) may occur at sensory processing ([1,2]), (ii) may be the underlying 
computation of the hippocampus ([3]), and may be the underlying principle in 
control architectures ([4]). Detailed architectural similarities between Kalman- 
filter and the entorhinal-hippocampal loop as well as between Kalman-filters 
and the neocortical hierarchy have been described recently ([5,6]). Interplay 
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between the dynamics of Kalman-filter-like architectures and learning of pa- 
rameters of neuronal networks has promising aspects for explaining known 
and puzzling phenomena, such as priming, repetition suppression and catego- 
rization ([7,8]). 

Kalman-filter is an on-line recursive algorithm. Unfortunately, Kalman-filtering 
requires the computation of the Kalman-gain. Recursions of the Kalman-gain 
matrix assume that covariance matrices of measurement noise and observation 
noise are known. In general, these parameter sets are not known in advance 
and may be subject to temporal changes. Moreover, to determine the Kalman- 
gain, the algorithm requires the inversion of matrices, which is hard to inter- 
pret in neurobiological terms. To our best knowledge, all suggested networks 
computed the Kalman-gain matrix directly, e.g., using matrix inversions. 

Here, an alternative route, the recursive prediction error (RPE) method ([9]) is 
followed. Using this method, we were able to construct a special parametriza- 
tion, which is (i) on-line and recursive and (ii) makes use of local interactions 
to estimate the filtering parameters, including the Kalman-gain. The next 
section (Section 2) reviews background materials, such as the constraints on 
connectionist systems (Section 2.1), the well known Kalman-filer recursion 
(Section 2.2). In Section 2.3 the recursive prediction error method is applied 
to estimate the Kalman-gain. Our particular parametrization and the corre- 
sponding architecture are provided in Section 3 and Section 3.1, respectively. 
Conclusions are drawn in the last section (Section 4). Detailed mapping to the 
neural substrate is not aimed here: There should be large differences if the goal 
is the mapping (i) to the control system of the brain, (ii) to the hippocampus, 
(iii) to the hippocampal-entorhinal loop, or, (iv) to the neocortical layers, etc. 
All suggestions on Kalman-filtering need to map a connectionist architecture 
to a part of the brain. 



2 Background 

2.1 Constraints on connectionist systems 

Connectionist systems are special non-linear systems having graph like struc- 
tures. Nodes of the graph are called neurons, whereas directed edges are the 
connections. Figure 1 depicts a neuron subject to local interactions. An inter- 
action is called local if it is exerted by a directed connection. The targeted 
neuron is the subject of the interaction. The end of the directed connection 
is depicted by a small circle, called synapse. This synapse could be of three 
types here, it is either excitatory, inhibitory or of multiplying type. The first 
two of these types are widely considered as neuronal operations. Multiplying 
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type synapse is, however, also possible: a neuron may affect another neuron 
by modulating the 'gain' of that neuron in a multiplicative way ([10]). For a 
review on different processing capabilities of single neurons, see, e.g., [11]). 
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Fig. 1. Local interactions of neurons 

The node (or neuron) is depicted by the large grey circle. Connections to and from 
other neurons are depicted by lines, these are called directed connections. The di- 
rection of a connection is denoted either by an arrow or by a small circle. The 
small circle is the synapse of the connection at around the targeted neuron. The 
interaction (i.e., the connection) targets that neuron, which (i) is in the direction of 
the arrow, or (ii) is at the circle, or (iii) is targeted by a connection being targeted 
by the small circle, and so on. The synapse could be of two basic types here, it is 
either additive or multiplying type. An additive synapse can be either excitatory 
or inhibitory. We assume that only the excitatory synapses can be adapted. At 
the same time, we also assume that feedforward inhibition, which is always present 
([12]), plays a role and learning occurs relative to a negative background. In turn, 
feedforward inhibitory synapses - which are not shown in our figures - and adapting 
excitatory synapses may exert an effective inhibition. That is, matrices representing 
excitatory synapse sets may have negative elements. 

From the point of view of neuronal modelling, 

(1) the series of observations constitute the input to the model, 

(2) the learning task corresponds to finding the best parametrization and 
the best hidden variables given all past observations and subject to con- 
straints prescribed by the norm (the measure) of a model and the noise 
of assumed by this model, whereas 

(3) filtering corresponds to the estimation of the hidden variables given the 
past observations. 
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The seminal work of Hebb ([13]) was a brilliant first attempt in its time to 
link neurophysiology with higher order behavioral phenomena studied by psy- 
chology. The central thesis postulated that changes in synaptic connection 
strength is primarily based on the correlation between the pre- and postsy- 
naptic neural activities. Recent experiments ([14,15,16], for a review, see, e.g., 
[17]) revealed, however, that exact timing and temporal dynamics of the neu- 
ral activities play a crucial role in forming the neuronal base of plasticity. The 
novel concept to generally describe the modified variants of the original form 
of Hebbian learning is called spike-time dependent synaptic plasticity (STDP). 
The term 'spike' denotes the fast potential change which propagates over the 
neurons and induces synaptic neurotransmitter release, which, in turn, may 
result in synaptic modifications. STDP means that strengthening occurs (i) if 
the neuron fires and (ii) if the excitatory synapse targeting this neuron deliv- 
ered a spike within a narrow time window around the time of firing. On the 
other hand, weakening occurs if the delivered spike at the excitatory synapse 
is outside of this short time window. 



2.2 Kalman- filter recursion 

Let us consider the following linear dynamical system (LDS): 



y t = Hxj + n t observation process (2.1) 
x t+1 = Fx t + m t dynamics of hidden variables (2.2) 

where m t oc A/"(0,n), n t oc Af(Q, E) are independent noise processes. Here, 
notation A/*(m, E) is a shorthand to denote a stochastic variable of expectation 
value m and covariance matrix E. Our task is the estimation of 

hidden variables x(t) G R n given the series of observations y(r) G R p , r < t. 

For estimations in squared (Euclidean) norm and Gaussian noise, the optimal 
solution was derived by Rudolf Kalman ([18,19]). The Kalman-filter recur- 
sion is reproduced here: Let E and Cov denote the expectation value and 
the covariance matrix operators, respectively. Let us introduce the follow- 
ing notations: x(i|r) = £(x t |yi, . . . y T ), N t = Cov (x t |yi, . . . y t ), and M t = 
Ccw(x t |yi, . . .y t _i) 

Lemma 1 (Kalman-filter recursion) Assume that x(£ — l\t — l) ; N t _i has 
been determined. Then 
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x(t|t-l) = Fx(t-l|t-l) (2.3) 

M t = FN t _iF T + II (2.4) 

Kf = M t H T (HMiH T + S) _1 (2.5) 

x(*|t) = ±(t\t - 1) + K{ (y t - HFx(t -l\t- 1)) (2.6) 

N t = (I - K.{H)M t (2.7) 



where I denotes the identity transformation (i.e., the identity matrix) and 
superscript T denotes transposition. 

Combining equations 2.3 and 2.6, the following filter- equation emerges: 
x(t + i\ t + 1) = Fx(t|t) +K t / +1 (y t+1 - HFx(t|*)) 

x(t+l|t) 

One may also introduce the so called prediction equation: 

x(t + = Fx(t|f - 1) + K?(y t - Hx(f|t - 1)) = Fx(t|t) (2.9) 

where Kf = FKf , which can be used to make an estimation before the (t+l)th 
measurement. 

In the literature, the two different quantities, i.e., K.{ (Eq. 2.5) and FK{ 
(Eq. 2.9) are both called Kalman-gains. 

2. 3 Review of the recursive prediction error method 

Here, we shall consider the filtering task and the learning task. This latter will 
be restricted to the learning of parameters required for proper adjustment of 
the Kalman-gain. 

Let us make the following notes. Iterations of K{, Kf, M t , and N t are inde- 
pendent from measurements on yi,...y t . However, these quantities depend 
from quantities {H, F, E and n}. The dependencies, at first sight, do not 
seem to admit a neuronal form. The problem is in the computation of the 
Kalman-gain (Eq. 2.5): this equation includes a matrix inversion, which does 
not admit a connectionist (i.e., artificial neuronal) network form. Moreover, it 
seems unlikely, that previous knowledge of covariance matrices £ and n can 
be assumed for neuronal systems. 

In the procedure that follows, we shall assume that the Kalman-gain consti- 
tutes the unknown parameter of the system. The Kalman-gain will be esti- 
mated on-line using the measured values of y t (i.e., the input of the neuronal 
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architecture). Because an on-line estimation makes use of (i) the estimated pa- 
rameters, (ii) the optimization of the activities given the actual observations, 
and (iii) updates the estimated parameters given the optimized activities and 
the previous estimations, on-line methods are best suited to changing world. 
The price is that on-line estimation may not be optimal for all past observation 
(this is not desirable in a changing world, anyway). Instead, on-line estima- 
tion becomes optimal asymptotically. It is well known that under rather mild 
conditions, both Kalman-gains, i.e., Kf ,K{ converge with exponential speed 
to the asymptotic K p and K/ values, respectively (i.e., lim t K t = K). In turn, 
on-line estimation is an attractive alternative when the goal is the estimation 
of the parameter(s) of the Kalman-gain. We shall substitute equations 2.3-2.7 
by the on-line methods. We shall apply the recursive prediction error method 
([9]) to derive recursive estimation for the Kalman-gain. 

Let K = K(6>) denote the parametrization of K, where K(0) may depend 
on 9 arbitrarily. We shall use this freedom to choose a particular form of the 
dependence. Now, our task is to estimate 9 and then to compute K(0) using 
the estimated value of 6. For the sake of simplicity, variable 9 will be scalar 
in the derivation below. The same derivation follows for vector and matrix 
variables. Also, derivation concerns the estimation of matrix K P , but similar 
derivation can be provided for matrix K.{ . 

First, the prediction equation (Eq. 2.9) shall be investigated. Let x(£, 9) denote 
the predictive estimation of the hidden variables belonging to 9. Equation 2.9 
can be written as 



The goal is to find the value of 9 that minimizes the squared norm of the 
reconstruction error vector, which is defined as e(t, 9) = y(t)—Hx.(t, 9). Vector 
Hx(£, 9), which is derived from the hidden variables and should match the 
input in squared norm, will be called the reconstructed input. By definition, 
the reconstruction error vector is a stochastic variable with zero mean and 
A covariance matrix (A/*(0, A)). For the purpose of on-line estimation, the A 
covariance matrix needs to be estimated from the data. The said goal, in 
mathematical terms, has the following form: 



Equation 2.11 can be seen as a maximum likelihood problem. We shall apply 
the following procedure: (i) The expectation value will be estimated by sam- 
ple averaging, i.e., stochastic gradient approximation will be used, (ii) 9 will 
be minimized along the gradient. Let us make use of the notation: w(t,8) = 
J^x(t, 9) and let y(t, 9) = Hx(t, 9) denote the reconstructed input. Minimiza- 



x(t + 1,9) = Fx(t, 9) + K p (9)(y(t) - Hx(t, 9)) 



(2.10) 




(2.11) 
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tion of Eq. 2.11 leads to: 9(t) = 9{t - 1) + ^{t)v{t) T k^eit), where *y(t) > 
is the learning rate, shorthand v(t,9) is given as v(t,9) = -§s(y(t,9) — y(t)) 
and a recursive equation can be derived for w(t + 1,0) from Eq. 2.10 if it is 
differentiated according to 9. The following recursive estimation can be gained 
for the Kalman-gain matrix K: 

Lemma 2 (Local Kalman-filter recursion) Assume thatx.(t), w(t), 9{t), 
A _1 (t) are given. Then 



y(t)=Hx(i) (2.12) 

e(t)=y(t)-y(t) (2.13) 

x(t + l)=Fx(*)+K(0(*))e(t) (2.14) 

v(i)=Hw(t) (2.15) 

w(f + 1) = Fw(t) + K'(0(t))e(t) - K(9(t))v(t) (2.16) 

9(t + 1) = 0(t) + 7 (t)v T (t)A- 1 (t)e(t) (2.17) 

A(f + 1) = A(t) + 7 (f) [e(t)e(t) T - A(t)l (2.18) 



where K' denotes The auxiliary vector w, which can be derived from the 
hidden vector x by differentiation according to 9, will play a crucial role in 
providing neuronal representation. For later purposes, we rewrite Eq. 2.18 into 
the following stable form: 



k-\t+ l)=A- 1 (t) + 7 (t) [A- 1 ^) - (A- 1 (t)e(t)) (A-^^e^ 



(2.19) 



In both recursions, i.e., in Eq. 2.18 and in Eq. 2.19, matrix A is an estimation 
of the covariance matrix of the reconstruction error vector e. In recursion 
Eq. 2.19, the inverse of the correlation matrix is estimated directly. Update 
Eq. 2.18 is a signal-Hebbian learning rule. Update Eq. 2.19 is in different form 
and can be seen as a neural update if spike-time dependent synaptic plasticity 
is taken into account, as it will be described later. 



3 Local Kalman-filter 



Now, let us choose an element of matrix K, say K^. In what follows, we shall 
make the particular assumption that K^- is an exponential function of 9. This 
assumption allows us to simplify the architecture considerably. The simplifi- 
cation is warranted by the particular property of the exponential dependence 
that K^- = Kjj, which will be important when the local connectionist architec- 
ture will be presented: We can use matrix K for both purposes, i.e., for K itself 
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and for K'. In our formulation, the recursive correction of the Kalman-gain 
assumes the multiplying exponentiated gradient form ([20,21]): 



K y (0(i + 1)) =exp ( 7 (t)v(t)A- 1 (t)e(t)) K^(t)) (3.1) 
Now, Eq. 2.16 can be written as 

w(t + 1) = Fw(t) + K0(t)) (e(t) - v(t)) (3.2) 

= Fw(t) + K(^))( e (t)-(Hw(t))) (3.3) 

= K(0(f ))e(f ) + (F - K(0(*))H) w(t) (3.4) 



5 1 . 1 Architecture 




Fig. 2. Computation of reconstruction error e 

Difference between two vector quantities can be computed by an ordered array of 
connections of inhibitory type. 

The reconstruction error vector (e) is computed by differencing between the 
input (y) and the estimated input y. The circuitry is shown in Fig. 2. 

Computation of the parameter 9 of the Kalman-gain can be performed by 
computing vector A _1 e. To this end, vector e is to be multiplied by the inverse 
of the correlation matrix A of the e vectors themselves. Matrix A is to be 
learned by the RPE method. 

Learning can be accomplished by Hebbian means as depicted in Fig. 3: The re- 
construction error layer provides input for an auxiliary neural sub-layer. This 
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Fig. 3. Learning and multiplying by the inverse of correlation matrix A 

There is a recurrent collateral system (a full connectivity excitatory feedback struc- 
ture) at the output layer. (Only few connections are shown.) Noise is present and 
accidental coincidences between excitatory inputs and neuronal firing can give rise 
to self-strengthening proportional to the connection strength [5]. This effect pro- 
vides the first (positive) term of Eq. 2.19. We assume that the recurrent collaterals 
have considerable delays and taht synapses are weakened because of the spike-time 
dependent synaptic plasticity process. The weakening effect is proportional to the 
postsynaptic activities and to the activities carried by the recurrent collaterals: This 
effect provides the negative term of Eq. 2.19. In turn, the architecture executes the 
update of Eq. 2.19 and multiplies the input (i.e., e) by A -1 . For details, see text. 

layer will output vector A _1 e. There is a full connectivity excitatory feedback 
structure at this sub-layer. It is assumed that noise is present in the system 
and that this noise gives rise to self-strengthening for all feedback connections. 
This self-strengthening effect is assumed to be proportional to the connection 
strength, i.e., it is responsible for the first (positive) term of Eq. 2.19. It is 
assumed, too, that this feedback structure, called recurrent collaterals have 
considerable delays, feedback arrives to the excitatory synapses late and weak- 
ening of the synapses occur because of the STDP process. The weakening is 
proportional to the postsynaptic activity and to the activity carried by the 
recurrent collateral and, in turn, the negative term of Eq. 2.19 emerges. The 
emerging full update corresponds to Eq. 2.19. 

For proper outputs, however, an additional structure is required: The excita- 
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tory feedback effect should act only once. We assume that an augmenting set 
of inhibitory neurons exist (these are not shown in Fig. 3) and that these in- 
hibitory neurons are excited by the same auxiliary neural sub-layer. Moreover, 
it is assumed that connections originated by the inhibitory neurons target the 
feedback excitatory synapses of the auxiliary neural sub-layer. The extra step 
to excite an inhibitory layer makes the inhibitory effect delayed relative to 
the excitatory effect at the feedback excitatory synapses. Inhibition stops the 
excitatory effect and excitatory feedback can occur only once in each iteration 
step. It is intriguing that this complex excitatory-inhibitory structure does 
exist in the cortex ([22]). It then follows that (i) the circuitry of Fig. 3 per- 
forms the RPE computation prescribed by Eq. 2.19 and that the output of 
the sub-layer is equal to A _1 e. 

w 



K(0) 




y 



Fig. 4. Computation and influence of parameter 6, the parameter of the 
Kalman- gain 

Component-by-component multiplication of vector v (Eq. 2.15) and vector A _1 e are 
performed and the components are summed up in a separate sub-layer. The output 
of this layer influences the Kalman-gain via synapses of multiplying types. These 
synapses influence the outputs of neurons of the reconstruction error vector. 

Last, the parameter of the Kalman-gain is to be computed. The computation 
is made of three parts. First, vector v (Eq. 2.15) and vector A~ x e are to be 
multiplied component-by component. Then the individual terms need to sum 
up to correct the previous estimation of the parameter of the Kalman-gain. 
Finally, the re-estimated parameter influences each outgoing connections in a 
uniform but non-linear fashion. This non-linear dynamic effect is the attenua- 
tion process of the Kalman- filter, i.e., the tuning of the Kalman-gain. We made 
the assumption that this non-linear influence is, in fact, exponential. The cor- 
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responding circuitry is shown in Fig. 4. Although, here a connectionist model 
is constructed, still, it may be important to emphasize that the Kalman-gain 
influences the output of the reconstruction error layer in a uniform fashion. 
It is then possible that not a full layer, but a single (or a few) neuron(s) (i) 
compute the scalar product of vectors v and A _1 e and (ii) target the neu- 
rons of the reconstruction error layer to tune the Kalman-gain as required for 
Kalman-filtering. 

The distinct parts of the RPE computation can be made recursive by means 
of a loop architecture. The loop architecture is depicted in Fig. 5. 




Fig. 5. Architecture to execute the recursive prediction error method 

The exponentiated dependence on the parameter of the Kalman-gain allows 
for a simple loop structure. The computations of Figs. 2-4 together, perform 
Kalman-filtering in the depicted loop as prescribed by Eqs. 2.12-3.4. Solid arrows: 
excitatory connections. Empty arrow: inhibitory connections. Recursive prediction 
error method proceeds as follows: Neural activities are given at time t at each layer. 
To compute the estimation at time t + 1, all activities are propagated through 
the connections represented by the arrows of the figure, simultaneously. Emerging 
constraints are as follows: (i) There should be an associative matrix connecting com- 
ponents of the estimation of the hidden variables and this matrix should be equal 
to F. (ii) There should also be an associative matrix connecting components of the 
estimation of auxiliary vector w and this matrix should be equal to F — KH. 

Figures 2-5 demonstrate that Kalman-filtering can be performed by local 
means in a loop through the RPE method. This observation supports top-down 
modelling efforts, which have appeared recently in the literature ([1,2,3,5,6,4]). 
The top-down modelling suggestion - that Kalman-filtering may play a role in 
cortical computations - calls for reinforcement from the bottom-up modelling 
methods of computational neuroscience. Our work makes a step towards this 
direction; we have transformed the suggestions of top-down modelling into 
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a dynamical neural architecture with appealing Hebbian learning rules. The 
particular form of Kalman-filtering we presented here poses several questions. 
We list a few of them, (i) How can be this architecture mapped onto the 
neocortex? (ii) How can be the derived computations performed in the neu- 
ral substrate? (iii) In particular, how reasonable is our assumption that the 
Kalman-gain exponentially depends on its parameter? These questions, which 
can be seen as the predictions of our model, remain open and call for further 
studies. 



4 Conclusions 

In this paper we have presented a neural (connectionist) representation for 
Kalman-filtering. The issue is worth to consider given the recent advances in 
the literature ([2,6,4]). Regarding the cited works we feel that all have the 
problem in common: Kalman-filtering should be given a local connectionist 
architecture for the proposed top-down approaches. Here, we have shown that 
yes, connectionist representation is possible for Kalman-filtering: The recur- 
sive prediction error method suits the requirements imposed by Hebbian con- 
straints. Moreover, RPE provides an appealing on-line learning scheme. This 
is a promising start. However, the mapping of the architecture to neocortical 
areas and the question if the predictive matrices of the Kalman-filter could be 
learned by Hebbian means remained open. Also, further studies starting from 
parameters of the neuronal substrate are in need to explore the particular 
predictions of the RPE model of Kalman-filtering. 
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